Foreign language acquisition of perceptually similar segments: evidence from Lower Sorbian

Lower Sorbian is a moribund language spoken in Eastern Germany that features a three-way sibilant contrast, /s, ʂ, ɕ/. The vast majority of L1 speakers are above eighty years of age and virtually no young Sorbians learn Lower Sorbian as their first language. There are language revitalization programs in place, but this means that virtually all Lower Sorbian speakers are L2 learners whose first language is German. German, as opposed to Lower Sorbian, has a two-way sibilant contrast, /s, ʃ/. So, Lower Sorbian learners need to acquire a perceptually similar sibilant contrast, /ʂ, ɕ/, that commonly assimilates with a single L1 segment, /ʃ/. The two-to-one assimilation makes acquisition difficult. In this project, I examine the acquisition of the three-way sibilant contrast using ultrasound technology. The ultrasound data revealed that learners in the contemporary context do not produce a distinction between /ʂ, ɕ/ and only learners at an advanced level who had significant exposure to L1 speakers have acquired a three-way sibilant distinction. The findings are put into the context of models of L2 acquisition and generalized implications for foreign language acquisition are discussed.


Introduction
Lower Sorbian is a west Slavic language spoken in Eastern Germany.It is a moribund language (Moseley, 2012) and is spoken near the border of Poland (Stone, 1993).The vast majority of first language Lower Sorbian speakers are above 80 years of age.Additionally complicating the matter, is that the language situation in Lower Sorbian is quite precarious.The majority of first language speakers do not use their mother tongue in daily communication which has led to certain degrees of language attrition.Additionally, nearly every young speaker of Lower Sorbian is a second language learner and acquires the language at school.For example, the Witaj program is a kindergarten curriculum which incorporates Lower Sorbian into the students' education.Following that, many students participate in the Dolnoserbski gymnazium Chóśebuz, situated in Cottbus (Marti, 2007).The school completes up to grade 12 and includes Lower Sorbian as a mandatory aspect of education.While the education can be beneficial, there is difficulty finding qualified teachers for the school and due to the advanced age of the L1 speakers, teachers are typically second language speakers themselves.
Lower Sorbian has a cross-linguistically uncommon three-way contrast among sibilant fricatives (approximately less than 6% of languages in the world; Maddieson, 1984) that makes contrasts at the dental/alveolar, /s, z/, retroflex, /ʂ, ʐ/, and alveolopalatal, /ɕ, ʑ/, places of articulation, similar to the contrasts observed in Modern Polish (Żygis, 2003).The contrast contains two sibilants, /ʂ, ɕ/, that share acoustic-perceptual similarities to /ʃ/.Many theories of language acquisition, such as the PAM-L2 (Best & Tyler, 2007) and the SLM-r (Flege & Bohn, 2021), have postulated how different aspects of acoustic-perceptual similarities with L1 segments impacts L2 acquisition.Contrasts such as the three-way contrast, under these theories are the most difficult to acquire due to the acoustic similarities between the segments.This makes Lower Sorbian an excellent language to examine foreign language acquisition of sibilant fricatives.

Second language acquisition
The PAM-L2 The Perceptual Assimilation Model of L2 Acquisition (PAM-L2; Best & Tyler, 2007) is an extension of the Perceptual Assimilation Model (PAM; Best, 1995) to second language acquisition.The PAM-L2 is a direct realist model, which assumes that perception is related to the perception of distal articulatory events (i.e., changes in vocal tract configurations), not specific acoustic patterns.Under the view of the PAM-L2, perceptual learning can take place on multiple levels, including phonological, phonetic, or gestural.One way in which category acquisition can occur is when there are two L2 segments that assimilate to two separate L1 segments (two-category assimilation).The PAM-L2 predicts good to excellent discrimination in this context.Learners then continue to acquire L2 vocabulary using the assimilated categories.This leads to a common L1-L2 phonological category for each of the L2 segments.However, in the case that there is a perceptible phonetic difference between L1-L2 pairs of segments, then it is possible that this difference becomes perceptibly stronger for the learner with time.If the differences between L1-L2 pairs becomes perceptible enough, then separate L1 and L2 phonetic categories can emerge.However, if the distinction is not strong enough the learner will not develop separate L2 categories (Tyler, 2019).This process is assumed to occur very early in acquisition, although it may strengthen over time.Best et al. (2009) suggest the process of perceptual attunement is tightly related to vocabulary acquisition.Bundgaard-Nielsen et al.'s (2012) Vocabulary-Tuning Model of L2 Rephonologization posits that an increase in vocabulary size drives perceptual attunement to L2 phonological structure.Support for this position was found by Bundgaard-Nielsen, Best, & Tayler (2011a, 2011b); however, Tyler (2019) suggests that an increase in vocabulary might support the acquisition of more discriminable L1-L2 pairs but could inhibit less discriminable pairs.Thus, Tyler (2019) suggests that the opportunity for phonetic learning is likely before the L2 vocabulary exceeds 50 words.He supports this position by comparing this to cL1 acquisition; children slow their vocabulary up to around 50 words, and then a rapid increase in vocabulary occurs after (e.g., Nazzi & Bertoncini, 2003).For Tyler (2019), after phonetic attunement takes place, vocabulary increase ramps up dramatically.Thus, the effect of learning a large vocabulary prior to phonetic attunement of difficult to perceive contrasts greatly hinders acquisition.
In the case of Lower Sorbian acquisition, there are two segments of interest that are perceptually similar, /ʂ, ɕ/, which are both perceptually similar to the same L1 segment, /ʃ/.According to the PAM-L2, this is single category assimilation and poor discrimination is predicted.Although, there may still be relative goodness-of-fit difference between the two assimilatory segments that allows learners to discriminate between them and thus acquire the L2 segments.However, the PAM-L2 and its predictions focus on learners in an immersion environment (second language acquisition; SLA); learning a second language in the learner's L1 environments with L2 classes (foreign language acquisition; FLA) has differences from immersion learning (Tyler, 2019).Nonetheless, the PAM-L2 can offer potential insights into foreign language acquisition (FLA).Tyler (2019) suggests that singlecategory assimilations (i.e., two L2 segments assimilating to the same L1 segment) are even more unlikely to be acquired in the classroom setting.The reason for this is because of a reduced access to consistent stimuli and the phonetic contrasts that distinguish them.Many second language classrooms are also taught by second language speakers, who may or may not consistently produce the language relevant contrasts, and likely produce contrasts differently than the older generation of L1 speakers.Additionally, there is also extensive acoustic-perceptual input from other second language learners, who also may not produce a target contrast.Tyler (2019) also notes that there is an increase in how fast vocabulary is acquired relative to immersion and L1 contexts which could impact perceptual acquisition.

The Speech Learning Model
The speech learning model (SLM; Flege, 1995) and the revised speech learning model (SLM-r; Flege & Bohn, 2021) have also been frontrunners of second language acquisition theories.The SLM was primarily designed to account for age related differences in language acquisition, while the SLM-r aims at providing an explanation for how reorganization of the phonetic system occurs over the life-span due to naturalistic L2 learning.
The SLM posits that for late acquiring bilinguals (i.e., someone who acquired two languages as a child, but the second language was acquired later than the first), L2 phonetic learning is influenced by acoustic-perceptual similarities between L2 and L1 phonetics.Thus, L1 and L2 segments become perceptually linked together.Specifically, during L2 learning, segments "map onto" perceptually similar L1 sounds.The ability for L2 learners to discern perceptually linked sounds occurs gradually, rather than rapidly; however, when this occurs, formation of a novel phonetic category can occur.
The mechanisms for novel category formation that guide L1 acquisition are believed to be intact and available for L2 learning.In L1 acquisition, this process is slow and begins as a set of equivalence classes (Kuhl, 1983) that involves grouping acoustically similar sounds together.This development continues long after establishing a phonetic inventory (Lee et al., 1999) and extends at least beyond the age of seven years (Bent, 2014).The SLM proposes that L2 learners of any age form acousticperceptual equivalence classes from the statistical properties of the input distributions of their exposure to the target L2.However, unlike L1 category formation, which has no previous language exposure and categories to interfere with it, L2 category formation relies on disruption of L2-to-L1 perceptual links through the ability to discern phonetic differences between perceptually similar L2 and L1 segments.Flege & Bohn (2021) suggest that L2 category formation should take at least as long as L1 category formation.
According to the SLM, L2 category formation depends on the degree of acoustic-perceptual similarity between the L2 segment and the closest L1 sound.That is, the more similar it is to an L1 segment, the harder it will be to form a new L2 category.Additionally, age of acquisition plays a significant role, with older learners having lower probabilities of forming new categories.
The SLM-r (Flege & Bohn, 2021) maintains that there is no difference in how L2 segments are acquired compared to L1 acquisition.The SLM-r posits that observed differences in L2 acquisition, and subsequently, the production and perception of L2 segments arise because L2 sounds are initially linked to L1 segments and serve as a substitute, especially for early learning.The existing L1 phonetic categories interfere with and can even block the formation of novel categories as a result.Additionally, L2 acquisition typically has a different set of input stimulus, which often includes foreign accented L2 speech.
The SLM-r distinguishes itself from the PAM-L2 in that it posits that the delinking process can be facilitated by growth of an L2 lexicon (Bundgaard-Nielsen et al., 2011a; Bundgaard-Nielsen et al., 2011b).While the PAM-L2 believes that growth of the L2 lexicon (beyond perhaps ~50 words) serves to stagnate L2 category formation, at least in the case of hard to discriminate L1 and L2 segments (Tyler, 2019).In this sense, the SLM-r puts forth that category formation is a much longer and drawn-out process (Flege & Bohn, 2021), while the PAM-L2 suggests it is a quicker process with a narrow opportunity for learners to acquire a new category (Tyler, 2019).Additionally, the PAM-L2 posits that learners attenuating to gestural movements in the vocal tract, while the SLM-r suggests that learners pay attention to acoustic differences in the input signal directly.Thus, under the view of the SLM-r, articulation is a matter of better navigation of what vocal tract shapes produce the target acoustic outputs.

Hypothesis
Based on both the PAM-L2 and SLM-r, the anticipated patterns of L2 segment assimilation is that learners will assimilate both Lower Sorbian, /ʂ, ɕ/, to German /ʃ/.This is due to the acousticperceptual similarities between them.The acoustics between the two segments /ʂ, ɕ/ resemble each other across in COG and skewness, having both a lower COG and higher skewness than /s/.Both values also significantly overlapped with each other for /ʂ, ɕ/.The feature in Lower Sorbian that was found to most strongly distinguish /ʂ, ɕ/ from each other was a much higher transitional F2 into the following vowel for /ɕ/ compared to /ʂ/ (Howson, 2015).The lower COG values observed in Lower Sorbian, tend to match cross-linguistic COG associated with /ʃ/ (Żygis, 2010) and COG and skewness measures associated with German /ʃ/ (Weirich & Simpson, 2015).Thus, I expect that low level (i.e., A-level) learners will share tongue contours for /ʂ, ɕ/ and that they will both resemble /ʃ/.It remains possible that there are still goodness-of-fit (or phonetically discernible) differences between /ʂ, ɕ/ and /ʃ/.More specifically, /ɕ/ has formant transitions and spectral characteristics similar to /ʃ/, while /ʂ/ has similar spectral characteristics, but different formant transitions.Thus, I expect that more advanced learners of Lower Sorbian will initially differentiate /ʂ/ from /ɕ, ʃ/ because of the stronger acoustic-perceptual dissimilarities.In terms of the PAM-L2 (Best & Tyler, 2007), the assumption is that learners are perceiving articulatory gestures and vocal tract changes, not more abstract acoustic characteristics.The implication of this is that as learners become more advanced, they become better at retrieving the articulatory movements necessary to produce a contrast.The expectation is that gradual improvement in the articulation of L2 segments should occur.In terms of the SLM-r (Flege & Bohn, 2021), there is a similar expectation.As learners' acoustic-perceptual representation improves, so too should articulation.However, because perceptual (and articulatory) dissimilarities may take more time to pick up on (Flege & Bohn, 2021), I predict that only more advanced learners will have acquired these contrasts.

Study design
The study design was an articulatory examination of tongue contours using ultrasound data collection techniques.Participants read sentences in Lower Sorbian with the target segments in them while they were being recorded with ultrasound.Tongue contours were compared using Generalized Additive Mixed Models (GAMMs).Data recording took place from March 27 th , 2020 until April 1 st , 2020 in Cottbus, Germany for the L2 learners.The advanced L2 learners, C04 and C05, were recorded at the University of Leipzig in Germany from April 4 th until April 8 th .The L1 speakers were recorded from July 18 th , 2022 until July 22 nd , 2022 in Cottbus, Germany.

Participants
As a baseline, 1 bilingual Sorbian/German speaker (male, 24), and 1 late-acquiring bilingual speaker of Sorbian (female, 40; age of first acquisition: 5) were recorded using ultrasound.These participants were chosen for this study because they both had significant input stimuli during the learning process from L1 speakers.Both speakers had input from L1 speaking relatives and additionally the older speaker attended the Sorbian school at a time when L1 Sorbian speaking teachers were active.Additionally, at the time of recording this data, few L1 speakers remain, and the advanced age of potential participants (above 80 years of age) makes ultrasound data especially difficult to record and interpret.
The criteria for language learner selection were that participants attended Dolnoserbski gymnazium Chóśebuz in Cottbus and were currently engaged in their language learning program.All participants had a first language of German.Participants were recruited for all three skill levels, A-, B-, and C-level learners based on a scaling system like the CEFR.Their skill level at the time was based on class they attended for Sorbian language at gymnazium Chóśebuz at the time of recording.Year learning Lower Sorbian ranged from approximately 6-17 years and was not necessarily reflective of the level of the speaker (i.e., more years did not necessarily reflect higher proficiency).Participant saturation was determined based on typical sample sizes for ultrasound studies.For baseline speakers, participants were selected on the basis that they had early exposure to Lower Sorbian and learned it in a natural setting (i.e., through hearing Lower Sorbian), although both participants also received an education in the Lower Sorbian school system.The Lower Sorbian speaking community is small, especially with respect to L1 speakers and so as many L1 speakers as possible were recruited.The L2 learners consisted of 4 A-level, 6 B-level, and 5 C-level learners.All of these participants were ages 17-18.Two of the C-level speakers were extremely advanced.Their ages were 35 and 56.All participants had no self-reported history of speech or hearing disorders.

Procedure
All participants read and signed the ethics forms prior to the experiment.They were also verbally informed as to the structure of the experiment and informed of their primary rights as a participant, including that their de-identified data would be shared with other researchers, and that they could refuse data sharing if they wished.
Data for the baseline speakers were recorded in a quiet room in the Serbski Institut in Cottbus, Brandenburg.Data for the Lower Sorbian learners were recorded in a quiet room at Dolnoserbski gymnazium Chóśebuz in Cottbus, Brandenburg.Ultrasound data were recorded with the Micro system from Articulate Assistant Advanced (AAA).I used the 20mm Radius probe with a 92 degrees field of view (FOV).Data was recorded at an average of 80 frames per second (fps).An ultrasound stabilization headset (Articulate Instruments Ltd., 2008) was also used to prevent movement of the ultrasound probe.
Participant forms were filled out prior to participation, including the questionnaire and consent forms.In order to pseudoanonymize participant data, participants were assigned a letter and number combination which corresponded to their skill level and the order in which they participated (e.g., C05 = the fifth C-level learner recorded; LS01 = the first L1 Lower Sorbian speaker recorded).Stimuli were presented using the AAA software package.Additionally, audio and video were synchronized and recorded using the AAA software.The full stimuli list is presented in Table 1.Stimuli were presented in a carrier phrase to facilitate more natural production.The carrier phrase was "Grońśo target hyšći raz" (please target say again).Stimuli were presented in a pseudorandomized order.Each participant produced 6 articulations of each segment in each of the three vocalic environments.This gives a total of 108 tokens for the L1 speakers (2 speakers × 3 segments × 3 vowels × 6 repetitions), 216 tokens for the A-level learners (4 speakers × 3 segments × 3 vowels × 6 repetitions), 324 tokens for the B-level learners (6 speakers × 3 segments × 3 vowels × 6 repetitions), and 270 tokens for the C-level (5 speakers × 3 segments × 3 vowels × 6 repetitions).

Ethical considerations
Ethical approval was obtained from the Deutsche Gesellschaft für Sprachwissenschaft (DGfS #2021-13-220106) and informed written consent was obtained from all participants for the use and publication of their data.

Analysis
Tongue contours were manually traced using AAA software (v220.04.01) at the temporal midpoint of the fricative.The midpoint was identified based on the duration of the fricative, where the onset was measured as the offset of formants and periodic sound waves associated with the preceding vowel and the offset was determined as the reduction in aperiodic noise and dissipation of frication on the spectrogram associated with the fricative.Polar coordinates were then extracted.Tongue contours were then compared using a custom script (Heyne et al., 2019) for GAMM analysis of polar coordinates in R (R Core Team, 2023).GAMMs were performed using the mgcv package (Wood, 2011), which also provides summary statistics.Tongue contours were first compared for L1 speakers to provide a baseline for comparison.Tongue contours were then compared for each language group (A, B, and C).Group C was split into two: C-level and highly advanced C-level.GAMMs were performed with parametric fixed effects for segment (3 levels: /s, ʂ, ɕ/) and environment (3 levels: /i, a, u/).The interaction between segment and environment was also included.A smoothing variable was also included for segment and the interaction between segment and environment.I included a factor smooth (i.e., a random effect) for the interaction between segment and speaker.The dependent variable was r, or the angle of the coordinate from the probe origin, and each smooth included Theta, which is the distance of the coordinate from the probe origin.For all smooths, cubic regression was used.The equation I used is printed in (1).

L1 Speakers
Figure 1-Figure 2 below present the GAMM smooths for the L1 speakers of Lower Sorbian and Table 2-Table 3 present the approximate significance for the interaction between theta and segment.For full statistical print-outs, see Extended data (Howson, 2023).The adjusted R 2 for the models were 0.979 and 0.983.
The GAMMs for the L1 speakers revealed a significant difference between all three segments, /s, ʂ, ɕ/.The tongue dorsum was most retracted for /s, ʂ/ and was more advanced for /ɕ/.The tongue contours for /s, ʂ/ were similar, but the tongue body was more raised for /ʂ/./ɕ/ had the most raised tongue body, but it was not much more raised than /ʂ/.

A-Level learners
Figure 3 below presents the GAMM smooths for A-level learners of Lower Sorbian and Table 4 presents the approximate significance for the interaction between theta and segment.For individual plots and full statistical print-out, see Extended data (Howson, 2023).The adjusted R 2 for the model was 0.946.The general results for the A-level learners revealed that there was a significant difference between /s/ and /ʂ, ɕ/, but not between /ʂ/ and /ɕ/.This suggests that learners at the A-level share one tongue contour for their pronunciations of /ʂ, ɕ/.The general tongue contours indicated a more retracted tongue dorsum for /s/, than for /ʂ, ɕ/.The contours for /ʂ, ɕ/ had a slightly more advanced dorsum, with a raised tongue body, resembling /ʃ/, which is present in the L1 German.
Individual results revealed significant deviations in learners' articulation of /ʂ, ɕ/, when compared against the general tongue contour from the group level GAMM.Although it should be noted that none of the individual plots revealed that any of the learners had acquired the three-way contrast, there was significant variation in their articulation of /ʂ, ɕ/.

B-Level learners
Figure 4 below presents the GAMM smooths for B-level learners of Lower Sorbian and Table 5 presents the approximate significance for the interaction between theta and segment.
For individual plots and full statistical print-out, see Extended data (Howson, 2023).The adjusted R 2 for the model was 0.957.
The GAMM results indicated that there was a significant difference between /s/ and /ʂ, ɕ/, but not between /ʂ/ and /ɕ/.This suggests that like the A-level learners, the B-level learners also have not acquired the three-way contrast between /s, ʂ, ɕ/ with respect to their articulation.The general tongue contours reveal a more retracted tongue dorsum for /s/, with a lower tongue body than /ʂ, ɕ/.The contours for /ʂ, ɕ/ had more rounded tongue shape, with more fronting, and more posterior tongue body raising than for /s/.
Individual results also revealed variation in the articulation of /ʂ, ɕ/, although as with the A-level learners, there were no significant differences between /ʂ, ɕ/.In most cases, the tongue dorsum was more drawn back for /s/ and was more advanced for /ʂ, ɕ/.In some cases, the more anterior part of the tongue body was raised for /ʂ, ɕ/, while for some learners the more posterior part of the tongue body or the entire tongue body for /ʂ, ɕ/ was more raised than /s/.This suggests that learners at the B-level continue to use the same segment in place for both /ʂ, ɕ/, although there was a great deal of variation in its realization.

C-Level learners
Figure 5 below presents the GAMM smooths for C-level learners of Lower Sorbian and Table 6 presents the approximate significance for the interaction between theta and segment.
For individual plots and full statistical print-out, see Extended data (Howson, 2023).The adjusted R 2 for the model was 0.979.
The GAMM results for C-level learners indicated that there was a significant difference between /s/ and /ʂ, ɕ/, but not between /ʂ/ and /ɕ/.This suggests that learners of Lower Sorbian at all levels have not acquired the three-way contrast.The individual results showed variation in articulation between speakers and in the case of the C-level learners, none of them showed the same backing of the tongue dorsum for /s/ compared to /ʂ/ and /ɕ/.
Highly advanced C-Level learners Figure 6 and Figure 7 below presents the GAMM smooths for highly advanced C-level learners of Lower Sorbian and Table 7 and Table 8 presents the approximate significance for the interaction between theta and segment.The adjusted R 2 for the models were 0.970 and 0.972, respectively.The Extended data (Howson, 2023) presents the full statistical printouts of both models.
In both cases, the learners acquired a three-way contrast for /s, ʂ, ɕ/; however, the realization of /ʂ, ɕ/ varied for both speakers.In both cases, /s/ had the lowest tongue body, accompanied by retracted tongue dorsum./ʂ/ for C04 had a similar degree of retraction for the tongue dorsum as /s/, with a more raised tongue body.The tongue shape for /ʂ/ was faithful to the L1 pronunciation./ɕ/ for C04 had a low and advanced tongue dorsum.This shape is likely due to the high degree of anterior tongue body advancement and raising.This tongue shape deviated significantly from the L1 pronunciation for /ɕ/./ʂ/ for C05 had even more tongue dorsum retraction than /s/, with a raised posterior tongue body that had a downward sloping anterior tongue body./ɕ/ for C05 had a more advanced tongue dorsum and tongue body.The posterior tongue body was raised, with a downward sloping anterior tongue body.

Discussion
The analysis revealed that for L1 speakers, there is a 3-way contrast intact, but that for L2 learners, substitution of       both /ʂ, ɕ/ for /ʃ/ occurred even for learners at the C-level.This was true for all learners except the most highly advanced C-level speakers.Both the PAM-L2 and SLM-r predict that such an assimilation would occur and that the contrast should be difficult to acquire because of the acoustic-perceptual similarity between the two.Nevertheless, in an immersion context, both models predict that it is possible for learners to acquire these contrasts.However, the observed learners were in a foreignlanguage context and the educators were primarily second language learners themselves.This means that there was likely varied input and the lack of access to L1 input may have greatly hindered their acquisition.However, it should be noted that one limitation of the study is the relatively small wordlist which makes it more difficult to assess category formation.
Learners in this dataset were not given specific pronunciation instructions.What this means is that learners only had access to any existing internal language learning mechanisms.Flege (1995) predicts that the mechanisms involved in L1 acquisition are still available for L2 learners and the evidence presented here does not disprove this but, at the least, it suggests that L1 interference in the acoustic-perceptual space (Kuhl, 1991; Kuhl et al., 1992; Kuhl & Iverson, 1995) significantly interferes with language learning mechanisms if they are still accessible.The result is that the distortion of the perceptual space inhibits perceptual learning of L1 assimilated segments and thus hinders any alteration in articulatory patterns and novel category formation.As a result, learners have a merging between /ʂ, ɕ/ in Lower Sorbian into their L1 German /ʃ/ category.One caveat to note is that the current L2 instructors do not have the level of fluency as the L1 instructors that the two advanced speakers (C04 and C05) had access to.As such, it is difficult to interpret how much input for the three-way sibilant contrast (if any) learners received.It is clear from discussions with the learners that pronunciation lessons are not a regular part of the curriculum.It remains very possible the lack of acquisition of the three-way contrast is predominantly due to lack of the three-way contrast in the input for learners.In short, the development of the language in the context of endangerment, revitalization, and its status as a minority language in the German context has possibly led to an inventory shift away from a three-way contrast to a more typical two-way contrast like the one observed in Upper Sorbian (Howson, 2017).If the desire of the community is to maintain specific speech patterns present in older L1 speakers, then from a practical standpoint, it seems that additional resources need to be committed to this achieve this goal.This is at least true in the foreign language context but would undoubtably assist in immersion contexts as well.This indicates that in language learning and preservations efforts, a multitude of resources should be employed to assist second language learners in acquisition of L2 segments.
There is also the case of the two highly advanced speakers who have acquired a three-way contrast in their L2 speech.First and foremost, the speakers are much older, and as a result had a significant amount of input from L1 speakers during their acquisition processes.The increased access to authentic speech could have contributed to the eventual formation of novel categories.However, it is also important to note that Lower Sorbian /ɕ/ for both speakers appears to have been assimilated into the German /ʃ/ category.In terms of the PAM-L2, this would suggest a better goodness-of-fit match between /ɕ/ and /ʃ/.While, /ʂ/ has similar spectral qualities, the formant transitions are much more similar between /ɕ/ and / ʃ/, while also having similar spectral qualities.This suggests that at least a certain degree of perceptual dissimilarity must be present for the acquisition process to take place.When a segment is "good enough," rather than forming a novel category, the L1 category becomes linked (in SLM terms).Whether or not L1 phonological patterns are imported into L2 or if L2 influences L1 phonological patterns is unclear.Additionally, it remains unclear if phonetic linking occurs with a decoupling of phonological behaviour.As a result, the interaction in phonological patterning and effects between L1 and L2 linked segments needs to be explored further.

Extended data
This project contains the following extended data: -extended_data_for_Howson_2023.pdf (full statistical print outs and plots for all the models presented in this paper.)The response document says 'L2 learners' has been changed to 'L2 users'.This doesn't seem consistent through the manuscript.For me, although this is a terminological point, it represents a change in focus.These speakers are the future of Sorbian and are using the language, some of them as prominent community figures by the sound of it.I would consider this an essential change for describing at least the highly advanced C-level speakers.
I appreciate the clarifications made to the description of the statistics.For the significance testing, I can see that the model summary from the mgcv package has been used for the approximate significance values.It would be good to explain this approach over, for example, model comparison (Sóskuthy 2017; http://eprints.whiterose.ac.uk/113858/).I'm not suggesting that the models should be redone.Indeed this might not be practical or appropriate for the current study.But it would be good to know why this strategy for significance testing was chosen.Perhaps I am not understanding correctly.
The word list is quite short, which has implications for the interpretation of the results.It is hard to know whether speakers' productions relate to individual words, or relate to production of a system.This could be clearly highlighted in the methods section as well as a short sentence in the Conclusion.
Ultrasound data were not rotated to the occlusal plane.I accept that not everything can be done, and this can't be changed after data collection!But it should be mentioned in the text.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bilingualism, articulatory phonetics, ultrasound and acoustic analysis, sociolinguistics of language revitalisation (Scottish Gaelic in particular).
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

David Bolter
Denison University, Granville, Ohio, USA As I noted in my original review, this article contains an articulatory study of the acquisition of the (voiceless) sibilant system of Lower Sorbian, including /s ʂ ɕ/, by speakers whose L1 or dominant language is German and whose baseline (voiceless) sibilant system includes /s ʃ/.In general, those speakers, whatever level they have on the Common European Framework of Reference, do not produce the unfamiliar /ʂ ɕ/ with distinct tongue profiles, as evidenced by the ultra-sound imaging undertaken by this author.Although most learners at the A, B and C did not produce distinct tongue contours for /ʂ ɕ/, two advanced C-level learners did show evidence of distinct tongue contours for /ʂ ɕ/.
In assessing the changes to the version of the paper, I think that the author did a sufficient job revising the paper such that I would revise my recommendation to "Accept" with minimal changes necessary.
I appreciated that the author added more discussion about the fact we are dealing with articulatory data showing no contrast between /ʂ/ and /ɕ/ for the majority of L2 learners.However, it is correct to point out, as Nance & Nagamine do in their review, that this does not necessarily mean that they do not have two categories at the acoustic or perceptual level.I feel, however, that this issue is best addressed with future research projects.It sounds like the author does have audio and video data from the same research project that could be used to further evaluate Sorbian learners' ability to acquire /ʂ/ and /ɕ/.
I also feel that a future study could investigate these same individual's productions in their German.Are they producing German /ʃ/ in the same way as they produce this merged /ʂ ~ ɕ/ category?I also very much appreciated the addition of a German language summary at the top.I'm wondering whether a Sorbian-language could be added and might be even more important to the community.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phonetics, Phonology, German Dialectology, Historical Linguistics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Version 1
Reviewer

David Bolter
Denison University, Granville, Ohio, USA This article presents an interesting analysis of the degree to which German learners of Lower Sorbian can acquire the contrast between /s ʂ ɕ/ in that language, coming from a system that only contrasts /s ʃ/.The study uses ultra-sound technology and finds that A-Level, B-Level and C-Level learners do not show a contrast in their articulation of (target) /ɕ/ and /ʂ/.Only highly advanced Clevel learners demonstrate a clear contrast between (target) /ɕ/ and /ʂ/.However, these highly advanced speakers did not necessarily realize that in a fully target-like manner.
This is an interesting article that has interesting implications for the kinds of language contact situations where the smaller language is moribund.In my review, I concentrate my energies on questions regarding the type of German that the participants speak (do they contrast /ʃ/ and /ç/?) and the consequences this may have for their acquisition of Sorbian as well as the larger context of language contact situations where the sibilant inventories differ.On the latter point, I wonder specifically whether the authors might be able to offer any predictions on what might happen in some of the cases that I mention.
My recommendation is that the article be approved pending some revision, with the changes overall being mostly rather minor.I think most of the paper is very clear, but there are some paragraphs that I find confusing specifically in the "Methods" section.
I further clarify that I have not evaluated the statistical methods undertaken in this article and I encourage further reviewers to address this aspect of the paper.
Typographic / Clarity suggestions: (p. 4-5): The language in the participants section is confusing to me.I'm having a hard time understanding the speakers' linguistic backgrounds.For example, the authors write "All participants had a first language of German" (p.5), but then a few sentences later, they discuss the data collected with L1 Sorbian speakers.Perhaps I am not understanding how the authors are L1 speakers, but this seems contradictory to me.Also, what exactly is meant by a "late-acquiring bilingual speaker of Sorbian" (p.4)?
(p. 6): "The contours for /ʂ, ɕ/ were more rounded, fronted, and raised than for /s/."In this sentence, my assumption is that "rounded" is referring to rounding or arching of the tongue body, rather than to lip rounding.Is this correct?(p.9) "learners" is misspelled as "leaners" in the second line under "Discussion".

General comments:
I'm wondering about the system of fricatives used by the participants (and the surrounding area) when speaking (Standard?)German.For example, many speakers in Saxony (cf.AdA: https://www.atlas-alltagssprache.de/runde-2/f25c/)do not contrast between /ʃ/ and /ç/.I suppose Cottbus is a bit outside of this area, but nonetheless establishing their 'starting point' in German could be useful in assessing their acquisition.If any speakers are not distinguishing /ʃ/ and /ç/ when speaking Standard German, then I imagine this might affect their ability to learn a contrast between /ʂ/ and /ɕ/.
Either I believe that the article could benefit from some mention of this problem, specifically with regards to the hypothesis section.As an ancillary question, I'm wondering whether the data in question could be used to test the goodness of fit of L2 /ɕ/ and /ʂ/ to other L1 sounds (mostly L2 /ɕ/ to L1 /ç/, but a similar problem arises with /ʂ/ and other adjacent sounds in the German inventory).
On a very different note, I'm wondering if the authors have any thoughts on other types of language contact situations where the fricative inventories differ.An interesting comparandum (in more ways than one) that the author may wish to consider now or for future work, would be the sibilant systems of Basque in contact with (Peninsular) Spanish (cf.Beristain 2022).Here, the fricative systems in contact are different (Basque /s̺ s̻ ʃ/ vs. Peninsular Spanish /θ s̺ /), but the /s̻ / may merge in either direction in different Basque varieties and this may have implications on how Spanish is spoken by those individuals.
Another language contact situation that could provide an interesting comparison would be the varieties of Sinitic Languages in China.My understanding is that Sinitic languages of the north like Mandarin general contrast /s ʂ ɕ/ , whereas Sinitic languages of the Central and Southern have fewer sibilants.For example, Chen & Gussenhoven (2015) describe Shanghai Chinese as contrasting /s/-series and /ɕ/-series and Hong Kong Cantonese (Zee 1991) has only an /s/-series.My understanding is that L1 speakers of such varieties merge the /s/ and /ʂ/-series when speaking Mandarin.
All that said, I'm wondering if the author has any predictions regarding how language contact situations with differing sibilant systems such as those referenced in the preceding paragraphs might behave.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
Are all the source data and materials underlying the results available?Yes If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment.A qualified statistician is required.

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phonetics, Phonology, German Dialectology, Historical Linguistics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 02 Feb 2024

Phil Howson
This article presents an interesting analysis of the degree to which German learners of Lower Sorbian can acquire the contrast between /s ʂ ɕ/ in that language, coming from a system that only contrasts /s ʃ/.The study uses ultra-sound technology and finds that A-Level, B-Level and C-Level learners do not show a contrast in their articulation of (target) /ɕ/ and /ʂ/.Only highly advanced C-level learners demonstrate a clear contrast between (target) /ɕ/ and /ʂ/.However, these highly advanced speakers did not necessarily realize that in a fully target-like manner.
This is an interesting article that has interesting implications for the kinds of language contact situations where the smaller language is moribund.In my review, I concentrate my energies on questions regarding the type of German that the participants speak (do they contrast /ʃ/ and /ç/?) and the consequences this may have for their acquisition of Sorbian as well as the larger context of language contact situations where the sibilant inventories differ.On the latter point, I wonder specifically whether the authors might be able to offer any predictions on what might happen in some of the cases that I mention.
My recommendation is that the article be approved pending some revision, with the changes overall being mostly rather minor.I think most of the paper is very clear, but there are some paragraphs that I find confusing specifically in the "Methods" section.
I further clarify that I have not evaluated the statistical methods undertaken in this article and I encourage further reviewers to address this aspect of the paper.Typographic / Clarity suggestions: (p.4-5): The language in the participants section is confusing to me.I'm having a hard time understanding the speakers' linguistic backgrounds.For example, the authors write "All participants had a first language of German" (p.5), but then a few sentences later, they discuss the data collected with L1 Sorbian speakers.Perhaps I am not understanding how the authors are L1 speakers, but this seems contradictory to me.Also, what exactly is meant by a "late-acquiring bilingual speaker of Sorbian" (p.4)?
Response: I changed L1 to "baseline speakers" to be consistent throughout.The two baseline speakers are also described in more detail: "These participants were chosen for this study because they both had significant input stimuli during the learning process from L1 speakers.Both speakers had input from L1 speaking relatives and additionally the older speaker attended the Sorbian school at a time when L1 Sorbian speaking teachers were active.Additionally, at the time of recording this data, few L1 speakers remain, and the advanced age of potential participants (above 80 years of age) makes ultrasound data especially difficult to record and interpret." (p. 6): "The contours for /ʂ, ɕ/ were more rounded, fronted, and raised than for /s/."In this sentence, my assumption is that "rounded" is referring to rounding or arching of the tongue body, rather than to lip rounding.Is this correct?
Response: Now reads: "The contours for /ʂ, ɕ/ had more rounded tongue shape, with more fronting, and more posterior tongue body raising than for /s/." (p.9) "learners" is misspelled as "leaners" in the second line under "Discussion".
General comments: I'm wondering about the system of fricatives used by the participants (and the surrounding area) when speaking (Standard?)German.For example, many speakers in Saxony (cf.AdA: https://www.atlas-alltagssprache.de/runde-2/f25c/)do not contrast between /ʃ/ and /ç/.I suppose Cottbus is a bit outside of this area, but nonetheless establishing their 'starting point' in German could be useful in assessing their acquisition.If any speakers are not distinguishing /ʃ/ and /ç/ when speaking Standard German, then I imagine this might affect their ability to learn a contrast between /ʂ/ and /ɕ/.Response: This is difficult to say with the current data set.What can be ascertained from the data provided here is that there is tremendous variation in the way individual speakers produce /ʂ, ɕ/ and that overwhelmingly the tongue contours overlap in a way as to suggest no significant differences in the way they are articulated.
I believe that the article could benefit from some mention of this problem, specifically with regards to the hypothesis section.As an ancillary question, I'm wondering whether the data in question could be used to test the goodness of fit of L2 /ɕ/ and /ʂ/ to other L1 sounds (mostly L2 /ɕ/ to L1 /ç/, but a similar problem arises with /ʂ/ and other adjacent sounds in the German inventory).
Response: With the current dataset, it's not really possible to do this type of comparison.It is an interesting suggestion and examinations of German L1 learners of a language with the three-way sibilant contrast would benefit strongly from incorporating these data.
On a very different note, I'm wondering if the authors have any thoughts on other types of language contact situations where the fricative inventories differ.An interesting comparandum (in more ways than one) that the author may wish to consider now or for future work, would be the sibilant systems of Basque in contact with (Peninsular) Spanish (cf.Beristain 2022).Here, the fricative systems in contact are different (Basque /s̺ s̻ ʃ/ vs. Peninsular Spanish /θ s̺ /), but the /s̻ / may merge in either direction in different Basque varieties and this may have implications on how Spanish is spoken by those individuals.
Another language contact situation that could provide an interesting comparison would be the varieties of Sinitic Languages in China.My understanding is that Sinitic languages of the north like Mandarin general contrast /s ʂ ɕ/ , whereas Sinitic languages of the Central and Southern have fewer sibilants.For example, Chen & Gussenhoven (2015) describe Shanghai Chinese as contrasting /s/-series and /ɕ/-series and Hong Kong Cantonese (Zee 1991) has only an /s/-series.My understanding is that L1 speakers of such varieties merge the /s/ and /ʂ/-series when speaking Mandarin.All that said, I'm wondering if the author has any predictions regarding how language contact situations with differing sibilant systems such as those referenced in the preceding paragraphs might behave.
Response: I think language contact situations can differ significantly based on the communities.Certainly, there are theoretical considerations along with those, such that we may assume or observe tendencies towards certain patterns of acquisition and assimilation.
In keeping with the theoretical claims in this paper, the most likely assimilatory patterns are likely those that adhere to the assimilation of segments with the greatest acousticperceptual overlap.
Competing Interests: No competing interests were disclosed.

Hypothesis:
This section gives predictions for the study based on the PAM-L2 and SLM(-r).However, the current study is all about articulation with no examination of the acoustic data.Some thinking needs to go into this about the targets that speakers are aiming for.Are they aiming for an acoustic target or an articulatory target in acquiring Sorbian?
Our understanding is that while the PAM-L2 has a prediction about articulation (that speakers directly perceive gestures and acquire these), the SLM-r is less specific about articulation.Clarify this distinction somewhere in the lit review as it is important for the hypothesis here.
In the hypothesis paragraph, you mention quite a lot about formant transitions and spectral characteristics of the fricatives.This can't be tested in the current design, which is purely articulatory.We suggest rethinking about how this material fits into the articulatory message of this paper, and the links between acoustics and articulation.
This is important for the rest of the study.For example, on p9 you refer to speakers not acquiring the contrast.The data here refer to midsagittal tongue splines at fricative midpoint only.It is possible that speakers are making an acoustic contrast in some other way.Unlikely perhaps, but some thought needs to go into what we can and can't ascertain from these data.

Discussion:
There is an argument here and in the Results that Sorbian speakers are substituting German /ʃ/ for two of the Sorbian fricatives.This seems impossible to know from the current data, since German /ʃ/ isn't analysed.It is possible that Sorbian learners are doing something that is different from L1 Sorbian and also different from German.See Moore et al. (2018) for a similar example where Japanese L1 English L2 speakers produce a sound which is not like L1-English /l/ and /ɹ/, but is also different from their Japanese [ɾ].Moore et al. (2018).https://doi.org/10.1250/ast.39.75

Open Science:
We are grateful to the author for providing the datasheets used.In order for future researchers to fully replicate this study and understand all the statistics carried out, it would be very helpful to have the code used as well.

Abstract:
This reads more like the first page of the Introduction.It would be helpful to make this more of an 'advert' for the study.Include more of the findings here.A major contribution of the work, and a major feature of the study design, is comparing the different proficiency levels for speakers.This should be mentioned in the Abstract.

Participants p5:
Could you specify the age and gender of participants in the main text instead of the supplementary materials, as this seems like key information.Explain a little bit more about language learning trajectories and how some participants have spent a long time at A-level compared to others who have reached B-level comparatively faster.In a context of language revitalisation learning, this isn't entirely surprising, but might be less evident to readers coming from a majority-language teaching background.
How was the level of the learners assessed?Is this through a standardised test or teacher perceptions?
Make it clear that these scores refer to CEFR scales (we assume), otherwise this can be a bit confusing.For example, as a British reader, it is hard not to read 'A-Level' as referring to the standard tests taken by 18-year-old school leavers in most of the UK.These are referred to as 'A-Levels'.
How was the level of the advanced C2 speakers ascertained?Is it still appropriate to refer to them as 'learners' if they are so advanced?Suggest replacing with 'L2 users' or 'L2 speakers'.
'one of which has achieved a near-native level of fluency' -how was this ascertained?Suggest replacing with 'L1-like level of fluency'.

Procedure:
Second paragraph at the bottom of p5: 'Data for the bilingual speakers' -aren't all of them bilingual?

Stimuli:
This is quite a small list for a study to make claims about category learning.As there is only one word per context, it is not possible to disentangle category learning, and learning of individual words.Some of the words are presumably more frequent than others, which will likely affect acquisition for the less advanced speakers.This needs to be acknowledged in the study text.

Analysis:
It would be helpful to have a little more information about the ultrasound data here: Was any manual correction carried out for the tongue splines obtained from AAA? Were the data rotated to the occlusal plane?Which version of AAA was used for the analysis?
Could you give a bit more information about how significance testing was carried out?Via model comparison?

Results: A-Level learners:
'A-Level learners share one phoneme' -rephrase.They share one tongue shape.We don't know about the perceptual categories needed for phoneme-hood.Final paragraph of this section: is the 'significant variation in their articulation of /ʃ/' a typo?

B-Level learners:
'have not acquired the three-way contrast' -make it clear that this refers to articulation, as we don't know about their acoustics/perception from this study.
First paragraph on p9: 'B-level learners continue to use German /ʃ/'.It's not possible to draw this conclusion from these data as they don't include analysis of German /ʃ/.
/ʃ/ often involves significant lip rounding.This study doesn't consider lip rounding analysis.This should be taken into account when drawing conclusions about likely German influence.

Highly advanced C-Level learners:
It seems there is some inconsistency in labelling these participants between the main text, and the supplementary materials.Consider using 'C04' and 'C05' throughout instead of 'C201' and 'C202'.

Tongue spline figures:
Consider changing the red/orange colour scheme to maximise contrast for readers.

Discussion:
It is suggested here that L2 Sorbian learners experience 'L1 interference in the acoustic-perceptual space'.But at the same time, the text says that Sorbian teachers might not produce this contrast themselves.So, it is possible that learners are doing a really amazing job of repeating exactly what their teachers produce and aren't experiencing L1 influence directly themselves at all.Could you explain how these things can be disentangled, or what is the most likely interpretation of the data?
'contrasts with difficult to perceive differences require specific training to acquire'.Could you give some evidence from the L2 pronunciation literature to back this up?This isn't my exact area of specialism, but my impression is that L2 pronunciation training is extremely difficult and might not be 'successful' in creating L1-like pronunciations.
At the same time, thought needs to be given to what is a realistic target for L2 users and language revitalisation speakers.In applied circles, it is now usually considered more appropriate to aim for intelligible and comprehensible L2 speech, rather than an L1 target for pronunciation.
You conclude that intervention and training is needed here and in the plain language summary.But the advanced C-Level speakers did produce this contrast without having had special training.It could therefore be argued that more language exposure and use is required instead.majority of L2 speakers exhibited two-way contrasts between /s/ and /ʂ ɕ/, arguably due to influence of L1 German.Only advanced L2 users of Lower Sorbian produced the three-way fricative contrast.Strengths of this study It is wonderful to see more articulatory work on Sorbian!It's so important to have this kind of language documentation, as well as a detailed investigation of speech production in a language revitalisation setting.We haven't previously seen analysis of language revitalisation users at different proficiency levels, so this is an important contribution of the work.There is very little articulatory work carried out with bilinguals/L2 users, so this work is a significant addition to the field.We suggest some revisions to the final version of the paper.This mainly concerns some rephrasing of the framing, more details on aspects of the methods and analysis, and greater acknowledgement of what can and can't be concluded from this dataset.Major points Sociolinguistically, this is a context of extreme language endangerment and some revitalisation.It is to be expected that speakers using the language as an L2 won't sound exactly the same as older people who acquired it in a completely different social context.Suggest some rephrasing and acknowledgement of this throughout.For example, removing references to speakers not producing sounds 'properly' etc. Replace with them sounding 'different' to Sorbian acquired two generations ago in an L1 setting.
Response: Thank you for this, I think it is an important change to make.I have changed the language throughout.
We appreciate that sociolinguistics is not the primary framing of the work and SLM(-r).However, the current study is all about articulation with no examination of the acoustic data.Some thinking needs to go into this about the targets that speakers are aiming for.Are they aiming for an acoustic target or an articulatory target in acquiring Sorbian?
Response: I have added the following text: "In terms of the PAM-L2 (Best & Tyler, 2007), the assumption is that learners are perceiving articulatory gestures and vocal tract changes, not more abstract acoustic characteristics.The implication of this is that as learners become more advanced, they become better at retrieving the articulatory movements necessary to produce a contrast.The expectation is that gradual improvement in the articulation of L2 segments should occur." Our understanding is that while the PAM-L2 has a prediction about articulation (that speakers directly perceive gestures and acquire these), the SLM-r is less specific about articulation.Clarify this distinction somewhere in the lit review as it is important for the hypothesis here.
Response: I have added the following text: "The PAM-L2 is a direct realist model, which assumes that perception is related to the perception of distal articulatory events (i.e., changes in vocal tract configurations), not specific acoustic patterns."and: "Additionally, the PAM-L2 posits that learners attenuating to gestural movements in the vocal tract, while the SLM-r suggests that learners pay attention to acoustic differences in the input signal directly.Thus, under the view of the SLM-r, articulation is a matter of better navigation of what vocal tract shapes produce the target acoustic outputs." In the hypothesis paragraph, you mention quite a lot about formant transitions and spectral characteristics of the fricatives.This can't be tested in the current design, which is purely articulatory.We suggest rethinking about how this material fits into the articulatory message of this paper, and the links between acoustics and articulation.We are grateful to the author for providing the datasheets used.In order for future researchers to fully replicate this study and understand all the statistics carried out, it would be very helpful to have the code used as well.
Minor points Abstract: This reads more like the first page of the Introduction.It would be helpful to make this more of an 'advert' for the study.Include more of the findings here.A major contribution of the work, and a major feature of the study design, is comparing the different proficiency levels for speakers.This should be mentioned in the Abstract.
Response: I have added the following: "The ultrasound data revealed that learners in the contemporary context do not produce a distinction between /ʂ, ɕ/ and only learners at an advanced level who had significant exposure to L1 speakers have acquired a three-way sibilant distinction."Plain language summary: This is still quite technical for non-linguists to read, for example references to IPA terminology.Suggest some rephrasing such that this summary would be useful to Sorbian community members (e.g.give example words for IPA).Would it be possible to translate into German and/or Sorbian as well for maximum readability by Sorbian users?

Response:
From what I can derive from Maddieson (1984), specific numbers on the threeway sibilant contrast are not provided.Numbers for languages with one or more fricative are provided, but reference to which segment that is are not.This is more complex when voiced and voiceless pairs are considered 2 segments and non-sibilant segments are included.Thus, minimally, we would expect that any of the numbers for 6+ fricatives could apply, but the numbers then vary.I could estimate, that in all likelihood, of the languages examined by Maddieson (1984), less than 6% have a three-way sibilant contrast.That being said, this would be an estimation based on the available data.
A lot in this analysis hangs on /ʂ/ and /ɕ/ being more acoustically/perceptually similar to /ʃ/ than /s/.Is there any way you can demonstrate this?
Response: I have added the following to page ?: "The acoustics between the two segments /ʂ, ɕ/ resemble each other across in COG and skewness, having both a lower COG and higher skewness than /s/.Both values also significantly overlapped with each other for /ʂ, ɕ/.The feature in Lower Sorbian that was found to most strongly distinguish /ʂ, ɕ/ from each other was a much higher transitional F2 into the following vowel for /ɕ/ compared to /ʂ/ (Howson, 2015).The lower COG values observed in Lower Sorbian, tend to match crosslinguistic COG associated with /ʃ/ (Żygis, 2010) and COG and skewness measures associated with German /ʃ/ (Weirich & Simpson, 2015)." Second language acquisition: The material in this section comes a bit abruptly after the Introduction.Could it be linked in a little more?
Response: I have added the following: "Many theories of language acquisition, such as the PAM-L2 (Best & Tyler, 2007) and the SLM-r (Flege & Bohn, 2021), have postulated how different aspects of acoustic-perceptual similarities with L1 segments impacts L2 acquisition.Contrasts such as the three-way contrast, under these theories are the most difficult to acquire due to the acoustic similarities between the segments.This makes Lower Sorbian an excellent language to examine foreign language acquisition of sibilant fricatives."PAM-L2: P3, final paragraph, column 2: Suggest rephrasing 'poverty of stimulus'.We're talking about a highly endangered language where people are doing their very best to revitalise and transmit the language however they can.A classroom will be different to a home setting, but 'poverty of stimulus' sounds very negative, perhaps unnecessarily so.
Response: I have changed this to: "The reason for this is because of a reduced access to consistent stimuli and the phonetic contrasts that distinguish them." Similarly, the reference to teachers 'not properly producing' target segments.Maybe rephrase to acknowledge that L2 users will speak differently to L1 users, but this is what we would expect anyway.
Response: I have changed this to read the following: "Many second language classrooms are also taught by second language speakers, who may or may not consistently produce the language relevant contrasts, and likely produce contrasts differently than the older generation of L1 speakers." Could the discussion about lexical frequency and word learning vs. category learning (Tyler 2019) be linked into the current study?This material seems perhaps less relevant for a word list task such as the current work.See also comments further down about the stimuli.Speech Learning Model: Second paragraph tells us that the target population are lateacquiring bilinguals.Mention this earlier.It would be good to define what is meant by 'lateacquiring bilingual'.
Response: I have added the following: "The SLM posits that for late acquiring bilinguals (i.e., someone who acquired two languages as a child, but the second language was acquired later than the first)" learned by L2 learners.
At the same time, thought needs to be given to what is a realistic target for L2 users and language revitalisation speakers.In applied circles, it is now usually considered more appropriate to aim for intelligible and comprehensible L2 speech, rather than an L1 target for pronunciation.
Response: I think this depends on the specific context and goals/aims of the group in question.From personal work, I can say it is very important to many indigenous communities that they maintain pronunciation similar/the same as it use to be spoken.I would not say this is an unachievable goal, but if it is the goal, then certainly specific measures to achieve that goal need to be taken.If it is not the wish of the community to maintain that pronunciation, then it is also not something that should realistically be expected or worked towards.I have added the following text: "If the desire of the community is to maintain specific speech patterns present in older L1 speakers, then from a practical standpoint, it seems that additional resources need to be committed to this achieve this goal."You conclude that intervention and training is needed here and in the plain language summary.But the advanced C-Level speakers did produce this contrast without having had special training.It could therefore be argued that more language exposure and use is required instead.

Response:
True, but realistically, that's not possible given the present situation in Lower Sorbian.Few L1 speakers remain and while I agree that significant exposure to L1 speech with such a contrast would/could facilitate learning, that simply isn't the situation in the present context.If the goal of the Sorbian community is to maintain this contrast (and others), then specific resources would need to be mobilized as it is not realistic or even possible that children or adults would receive significant input stimuli from L1 speakers or even L2 speakers who produce the target contrasts reliably.
Competing Interests: No competing interests were disclosed.

○
still find the distinction between 'L1 speakers' and 'baseline speakers' confusing I'm afraid.It is clear that it would not be possible to ultrasound 80+ year old L1 speakers.My understanding is that the 'baseline' speakers in this study are sequential bilinguals who have achieved very high proficiency in Sorbian, with lots of L1 Sorbian input.That's completely fine.But it's confusing to refer to them as 'L1 speakers'.Suggest changing to 'Baseline speakers' or similar consistently throughoutWithout this change in wording, it's a bit confusing to read the study I'm afraid.
Report 31 August 2023 https://doi.org/10.21956/openreseurope.16097.r34213© 2023 Bolter D. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 1 :
Figure 1: Could you give a bit more information about what is shown in this figure?Are the arrows relevant to a particular bit of the waveform?What do the spectrograms show here which informs the reader beyond IPA symbols and arrows?Suggest changing to a schematised IPA diagram instead unless the spectrograms illustrate something specific.

Table 8 . Approximate significance of smoothing term Theta by Segment for C05.
Idealistically, this would involve perceptual training that would cater to the speaker's L1 segments and assist in training the learner in distinguishing their existing L1 categories and L2 categories.This would also be accompanied by specific instructions on how the target segments are produced.Ultrasound technology has been used in this context both for direct visualization of how the learner produces the contrast themselves and how they should produce the contrasts(Antolík et al., 2019)as well as providing visual instruction guides for learners(Bliss et al., 2018).
(Krech et al. 2009sometimes said that Standard German /ʃ/ is labialized [ʃʷ](Krech et al. 2009: 81-83), perhaps to amplify the contrast with /ç/ (?).If this is so, then a labialized[ʃʷ]might not be all that different from a Slavic [ʂ].Especially, since some researchers dispute the label of "retroflex" being applied to a language like Polish on the grounds that Polish /ʂ/ has a relatively flat tongue profile (see the debate between Hamann 2004, Żygis et al. 2012 and Ćavar & Lulich 2020).
It seems that 'new speakers' would be a very appropriate way to describe these Sorbian L2 users.Some papers we found helpful: O'Rourke & Ramallo.(2011).https://doi.org/10.1075/lplp.35.2.03oro Jaffe.(2015).https://doi.org/10.1515/ijsl-2014-0030 here, but if you wanted to consider this aspect in more detail, you could investigate work conducted in the new speaker framework.It seems that 'new speakers' would be a very appropriate way to describe these Sorbian L2 users.Some papers we found helpful: O'Rourke & Ramallo.
: I have added the following text: "In terms of the PAM-L2 (Best & Tyler, 2007), the assumption is that learners are perceiving articulatory gestures and vocal tract changes, not more abstract acoustic characteristics.The implication of this is that as learners become more advanced, they become better at retrieving the articulatory movements necessary to produce a contrast.The expectation is that gradual improvement in the articulation of L2 segments should occur.In terms of the SLM-r (Flege & Bohn, 2021), there is a similar expectation.As learners' acoustic-perceptual representation improves, so too should articulation."Thisisimportant for the rest of the study.For example, on p9 you refer to speakers not acquiring the contrast.The data here refer to midsagittal tongue splines at fricative midpoint only.It is possible that speakers are making an acoustic contrast in some other way.Unlikely perhaps, but some thought needs to go into what we can and can't ascertain from these data.Discussion: There is an argument here and in the Results that Sorbian speakers are substituting German /ʃ/ for two of the Sorbian fricatives.This seems impossible to know from the current data, since German /ʃ/ isn't analysed.It is possible that Sorbian learners are doing something that is different from L1 Sorbian and also different from German.See Moore et al. (2018) for a similar example where Japanese L1 English L2 speakers produce a sound which is not like L1-English /l/ and /ɹ/, but is also different from their Japanese [ɾ].Moore et al.(2018).https://doi.org/10.1250/ast.39.75 Open Science: